[WIP] 🌱 e2e: change behavior of VerifyMachinesReady to verify once after a successful list of machines by chrischdi · Pull Request #13015 · kubernetes-sigs/cluster-api

chrischdi · 2025-11-20T12:46:18Z

What this PR does / why we need it:

The VerifyMachinesReady is used in places where we think a Cluster is a certain state.

Instead of checking if the Machine's Ready conditions are true, the function currently waits up to five minutes for all Machine's Ready conditions to get true.

The logic currently is as follows:

cluster-api/test/framework/cluster_helpers.go

Lines 490 to 515 in 8dcfbaf

    
           func VerifyMachinesReady(ctx context.Context, input VerifyMachinesReadyInput) { 
        
           	machineList := &clusterv1.MachineList{} 
        
           	// Wait for all machines to have Ready condition set to true. 
        
           	Eventually(func(g Gomega) { 
        
           		g.Expect(input.Lister.List(ctx, machineList, client.InNamespace(input.Namespace), 
        
           			client.MatchingLabels{ 
        
           				clusterv1.ClusterNameLabel: input.Name, 
        
           			})).To(Succeed()) 
        
           		g.Expect(machineList.Items).ToNot(BeEmpty(), "No machines found for cluster %s", input.Name) 
        
           		for _, machine := range machineList.Items { 
        
           			readyConditionFound := false 
        
           			for _, condition := range machine.Status.Conditions { 
        
           				if condition.Type == clusterv1.ReadyCondition { 
        
           					readyConditionFound = true 
        
           					g.Expect(condition.Status).To(Equal(metav1.ConditionTrue), "The Ready condition on Machine %q should be set to true; message: %s", machine.Name, condition.Message) 
        
           					g.Expect(condition.Message).To(BeEmpty(), "The Ready condition on Machine %q should have an empty message", machine.Name) 
        
           					break 
        
           				} 
        
           			} 
        
           			g.Expect(readyConditionFound).To(BeTrue(), "Machine %q should have a Ready condition", machine.Name) 
        
           		} 
        
           	}, 5*time.Minute, 10*time.Second).Should(Succeed(), "Failed to verify Machines Ready condition for Cluster %s", klog.KRef(input.Namespace, input.Name)) 
        
           }

This changes the behavior to only retry the list api calls.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area e2e-testing

…ccessful list of machines

k8s-ci-robot · 2025-11-20T12:46:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2025-11-20T15:37:00Z

test/framework/cluster_helpers.go

-				}
+	}, 5*time.Minute, 10*time.Second).Should(Succeed(), "Failed to list Machines to check the Ready condition for Cluster %s", klog.KRef(input.Namespace, input.Name))
+
+	Expect(machineList.Items).ToNot(BeEmpty(), "No machines found for cluster %s", input.Name)


I'm not sure if this is desirable in that way considering the Prow CI infrastructure.

I would expect a certain amount of flakiness purely because of the infra we run on.

We can give this a try early next cycle though and then keep an eye on k8s-triage

Alternative to this would be to just reduce the timeout to e.g. 30s/1m. It would allow some sort of flakiness but not entire rollout sequences in progress.

Side note: In general we probably want to do the same for VerifyClusterCondition / VerifyClusterAvailable

e2e: change behavior of VerifyMachinesReady to verify once after a su…

e628cd2

…ccessful list of machines

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/e2e-testing Issues or PRs related to e2e testing labels Nov 20, 2025

k8s-ci-robot requested review from elmiko and richardcase November 20, 2025 12:46

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 20, 2025

sbueringer reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 🌱 e2e: change behavior of VerifyMachinesReady to verify once after a successful list of machines#13015

[WIP] 🌱 e2e: change behavior of VerifyMachinesReady to verify once after a successful list of machines#13015
chrischdi wants to merge 1 commit intokubernetes-sigs:mainfrom
chrischdi:pr-verify-machines-ready-not-wait

chrischdi commented Nov 20, 2025

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

sbueringer Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	func VerifyMachinesReady(ctx context.Context, input VerifyMachinesReadyInput) {
	machineList := &clusterv1.MachineList{}

	// Wait for all machines to have Ready condition set to true.
	Eventually(func(g Gomega) {
	g.Expect(input.Lister.List(ctx, machineList, client.InNamespace(input.Namespace),
	client.MatchingLabels{
	clusterv1.ClusterNameLabel: input.Name,
	})).To(Succeed())

	g.Expect(machineList.Items).ToNot(BeEmpty(), "No machines found for cluster %s", input.Name)

	for _, machine := range machineList.Items {
	readyConditionFound := false
	for _, condition := range machine.Status.Conditions {
	if condition.Type == clusterv1.ReadyCondition {
	readyConditionFound = true
	g.Expect(condition.Status).To(Equal(metav1.ConditionTrue), "The Ready condition on Machine %q should be set to true; message: %s", machine.Name, condition.Message)
	g.Expect(condition.Message).To(BeEmpty(), "The Ready condition on Machine %q should have an empty message", machine.Name)
	break
	}
	}
	g.Expect(readyConditionFound).To(BeTrue(), "Machine %q should have a Ready condition", machine.Name)
	}
	}, 5time.Minute, 10time.Second).Should(Succeed(), "Failed to verify Machines Ready condition for Cluster %s", klog.KRef(input.Namespace, input.Name))
	}

Conversation

chrischdi commented Nov 20, 2025

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

sbueringer Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbueringer Nov 20, 2025 •

edited

Loading